Sapiens Depth 2b Bfloat16
Sapiens-2B is a vision Transformer model pre-trained on 300 million high-resolution human images, specifically optimized for human depth estimation tasks, supporting 1K resolution inference with excellent generalization capabilities in real-world scenarios.